Mixtures of IBM Model 2 ∗

نویسندگان

  • Jorge Civera
  • Alfons Juan
چکیده

Mixture modelling is a standard pattern classification technique. However, in statistical machine translation, the use of mixture modelling is still unexplored. Two main advantages of the mixture approach are first, its flexibility to find an appropriate tradeoff between model complexity and the amount of training data available and second, its capability to learn specific probability distributions that better fit subsets of the training dataset. This latter advantage is even more important in statistical machine translation, since it is well known that most of the current translation models proposed have limited application to restricted semantic domains. In this paper, we describe a mixture extension of the IBM model 2 along with the maximum likelihood estimation of its parameters through the EM algorithm and a dynamic-programming decoding algorithm for this mixture model. Preliminary experiments carried out on the Tourist task show that the mixture extension conveys a decrease in word-error rate of up to 15%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correlations and Predictions of THF + 2-Alkanol Binary Mixtures Behaviour by PC-SAFT Model and Friction Theory

In this article the behavior of tetrahydrofuran (THF) + 2-alkanol namely 2-propanol, 2-butanol, 2-pentanol, 2-hexanol and 2-heptanol binary mixtures through the density and viscosity measurements have been studied as a function of composition and within the temperature range of 293.15–313.15 K. The excess molar volume, isobaric thermal expansivity, partial molar volumes, and viscosity deviation...

متن کامل

محاسبه ترازهای انرژی پایین ایزوتوپهای زوج-زوج کادمیوم، قلع و تلور در چارچوب مدل بوزونی بر همکنش‌دار (IBM-1)

  The dynamical symmetries in even-even nuclei were investigated by Arima and Iachello in 1974, and led to a model called ;ampquotInteracting Boson Model, (IBM)".   In this article we have outlined some basic ideas used in IBM-1 and carried out the calculations for low laying energy levels of even-even isotopes Cd, Sn and Te via PHINT code.   The calculations for energy and quadra pole moment t...

متن کامل

A SIMPLE MODEL FOR THE ESTIMATION OF DIELECTRIC CONSTANTS OF BINARY SOLVENT MIXTURES

A simple and reliable method for quick estimation of the dielectric constant of a binary solvent mixture is proposed. The validity of the proposed method has been tested for a broad range of binary solvent mixtures

متن کامل

Solubility Prediction of Anthracene in Non-Aqueous Solvent Mixtures Using Jouyban-Acree Model

      A quanitative structure property relationship was proposed to calculate the binary interaction terms of the Jouyban-Acree model using solubility parameter, boiling point, vapour pressure and density of solvents. The applicability of the proposed method for reproducing solubility data of anthracene in binary solvents has been evaluated using 116 solubility data sets collected from the lite...

متن کامل

Predicting Flow Number of Asphalt Mixtures Based on the Marshall Mix design Parameters Using Multivariate Adaptive Regression Spline (MARS)

Rutting is one of the major distresses in the flexible pavements, which is heavily influenced by the asphalt mixtures properties at high temperatures. There are several methods for the characterization of the rutting resistance of asphalt mixtures. Flow number is one of the most important parameters that can be used for the evaluation of rutting. The flow number is measured by the dynamic creep...

متن کامل

The Prediction of Surface Tension of Ternary Mixtures at Different Temperatures Using Artificial Neural Networks

In this work, artificial neural network (ANN) has been employed to propose a practical model for predicting the surface tension of multi-component mixtures. In order to develop a reliable model based on the ANN, a comprehensive experimental data set including 15 ternary liquid mixtures at different temperatures was employed. These systems consist of 777 data points generally containing hydrocar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006